README: Signal-Noise Decomposition and Noise Share Estimation for SOFIA¶

This document describes the replication code for the signal-noise decomposition and noise share estimation analysis of the SOFIA (Secured Overnight Funding Interbank Average) reference rate. The analysis uses state-space (local level) models estimated via maximum likelihood to (i) decompose each SOFIA variant into a latent efficient rate and a transitory noise component, and (ii) estimate relative noise shares across pairs of alternative SOFIA constructions.


Authors¶

James Brugler, Calebe De Roure, Marta Khomyn, Max Prakoso, Talis Putnins

File overview¶

The replication package consists of two Jupyter notebooks (which execute the analysis) and three Python modules (which contain model definitions and helper functions). The table below summarises each file and its role.

File Type Purpose
sofia_signal_noise_extract_final.ipynb Notebook Estimates the univariate local level model for each SOFIA variant individually, extracting the daily smoothed state (efficient rate) and noise (measurement disturbance).
sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb Notebook Estimates the bivariate local level model for each (base, alternative) SOFIA pair, computing relative noise shares and Wald tests for equal noise variances.
stsp_ll_mods.py Module Defines and estimates the univariate local level state-space model. Called by the signal-noise extraction notebook.
stsp_mods.py Module Defines and estimates the bivariate (contemporaneous) local level state-space model. Called by the noise shares notebook.
noiseshares.py Module Computes noise share ratios from the estimated bivariate model parameters. Called by the noise shares notebook.

Dependency structure¶

The two notebooks are independent of each other and can be run in either order. Each notebook calls one or more of the Python modules. The dependency graph is:

sofia_signal_noise_extract_final.ipynb
    ??? stsp_ll_mods.py
            ??? stspestmp_ll()          [univariate local level estimation]

sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb
    ??? stsp_mods.py
    ?       ??? stspestmp_contemp()     [bivariate local level estimation]
    ??? noiseshares.py
            ??? noiseshares_contemp()   [noise share computation]

All three .py modules must be located in the same working directory as the notebooks (or on the Python path).

Econometric models¶

Both analyses are built on the local level (random walk plus noise) state-space framework, estimated via maximum likelihood using the Kalman filter as implemented in statsmodels.tsa.statespace.MLEModel.

Model 1: Univariate signal-noise decomposition¶

File: stsp_ll_mods.py → called by sofia_signal_noise_extract_final.ipynb

For each individual reference rate $y_t$, the model is:

$$y_t = \mu_t + \beta \cdot ONR_t + \varepsilon_t, \qquad \varepsilon_t \sim N(0,\, \sigma^2_{\varepsilon})$$$$\mu_t = \mu_{t-1} + \eta_t, \qquad \eta_t \sim N(0,\, \sigma^2_{\eta})$$

where:

  • $\mu_t$ is the latent efficient rate, modelled as a random walk.
  • $ONR_t$ is the overnight cash rate target, included as an exogenous control to absorb mechanical level shifts when the RBA changes the policy rate. This term is omitted when the ONR does not vary in the sample.
  • $\varepsilon_t$ is the transitory noise (measurement disturbance).
  • $\eta_t$ is the state innovation.

The Kalman smoother provides $E[\mu_t \mid y_1, \ldots, y_T]$ (the smoothed state) and $E[\varepsilon_t \mid y_1, \ldots, y_T]$ (the smoothed measurement disturbance) for each day $t$, enabling a full signal-noise decomposition of the observed rate.

Parameters estimated: $\sigma^2_{\eta}$ (state innovation variance), $\sigma^2_{\varepsilon}$ (noise variance), and $\beta$ (ONR coefficient, if applicable).

Model 2: Bivariate noise share estimation¶

File: stsp_mods.py → called by sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb

For each pair of simultaneously observed rates $(y_{1,t},\, y_{2,t})$, the model is:

$$y_{1,t} = \mu_t + d + \beta_1 \cdot ONR_t + \varepsilon_{1,t}, \qquad \varepsilon_{1,t} \sim N(0,\, \sigma^2_1)$$$$y_{2,t} = \mu_t + \beta_2 \cdot ONR_t + \varepsilon_{2,t}, \qquad \varepsilon_{2,t} \sim N(0,\, \sigma^2_2)$$$$\mu_t = \mu_{t-1} + \eta_t, \qquad \eta_t \sim N(0,\, \sigma^2_{\eta})$$

where:

  • Both rates share the same latent efficient rate $\mu_t$.
  • $d$ is a constant spread (level difference) between the two rates.
  • $\varepsilon_{1,t}$ and $\varepsilon_{2,t}$ are idiosyncratic noise terms with potentially different variances.

Noise shares are computed as:

$$NS_j = \frac{\sigma^2_j}{\sigma^2_1 + \sigma^2_2}$$

A noise share greater than 0.5 indicates that rate $j$ contributes more than half of the total pricing noise. A Wald test for $H_0: \sigma^2_1 = \sigma^2_2$ assesses whether the difference is statistically significant.

Parameters estimated: $\sigma^2_{\eta}$, $\sigma^2_1$, $\sigma^2_2$, $d$, and $\beta_1, \beta_2$ (if applicable).

Estimation details¶

Optimisation¶

Both models are estimated via a two-stage L-BFGS-B optimisation procedure (implemented in the optim_details() function within each module):

  1. Stage 1 (initialisation): Fit from default starting values with up to 1,000 iterations and function tolerance of $10^{-10}$. This moves the parameters away from potentially poor defaults.
  2. Stage 2 (refinement): Re-fit from the Stage 1 estimates with up to 10,000 iterations and the same tolerance, converging on the MLE.

Parameter constraints¶

Variance parameters are constrained to be strictly positive via a squaring transformation with a floor of $10^{-7}$:

$$\sigma^2 = \tilde{\sigma}^2 + 10^{-7}$$

where $\tilde{\sigma}$ is the unconstrained parameter value from the optimiser. This ensures numerical stability in the Kalman filter.

Initialisation¶

All models use diffuse initialisation for the state variable, reflecting the assumption that the initial level of the efficient rate is unknown a priori.

Parallelisation¶

Estimation across multiple reference rates (or rate pairs) is parallelised using Python's multiprocessing.Pool.map(). Each estimation function is designed to accept a single list argument to conform to the map() interface. The number of worker processes is set to cpu_count() - 4, leaving cores available for other processes.

Data¶

Input data¶

All input files are expected in a Data/ subdirectory relative to the notebook working directory.

File Source Contents
f01d.xlsx RBA Statistical Table F1 Cash rate target (ONR) and actual overnight rate (AONIA). First 10 rows are metadata headers.
sofia-beta-version_april_2024_corected.xlsx ASX ASX-published SOFIA beta rates (VWAP and volume-weighted median). First 4 rows are headers; last 10 rows are footnotes.
allSOFIAslonger.xlsx RBA Synthetic SOFIA variants computed under different transaction filtering rules (outlier thresholds, minimum counterparty counts).
sofia_replicate.xlsx RBA RBA-replicated base SOFIA rates, used for verification against ASX-published values.
SOFIARelParty.xlsx RBA SOFIA rates computed after excluding related-party transactions, used for verification.

Sample¶

The estimation sample runs from 2022-01-04 to 2025-03-19. There are four dates within this window where SOFIA is not computed due to insufficient transaction volume. These missing values are handled via a configurable fill method (default: substitute from RBA base SOFIA).

SOFIA variants analysed¶

The alternative SOFIA constructions (used as ref2 in the noise share analysis) differ in their transaction filtering rules:

Variable name Description
sofia_vwap_0525 5% lower / 25% upper outlier thresholds
sofia_vwap_0525_2mn Same thresholds, plus minimum 2 counterparties
sofia_vwap_2525 25% lower / 25% upper outlier thresholds
sofia_vwap_2525_2mn Same thresholds, plus minimum 2 counterparties
sofia_vwap_relparty Excluding related-party transactions

Workflow¶

Analysis 1: Univariate signal-noise decomposition¶

Notebook: sofia_signal_noise_extract_final.ipynb

  1. Load and merge data. Read the RBA cash rate data (f01d.xlsx), ASX-published SOFIA (sofia-beta-version_april_2024_corected.xlsx), and RBA synthetic SOFIA variants (allSOFIAslonger.xlsx). Merge all sources into a single DataFrame on date. Verify consistency of replicated rates via scatter plots.

  2. Prepare estimation inputs. For each reference rate to be estimated (sofia_vwap_asx, sofia_vwap_rba), construct a $T \times 1$ observation vector (optionally rounded to 2 decimal places) and a $T$-vector of overnight rates for the exogenous control.

  3. Estimate in parallel. Pass each rate's input list to stspestmp_ll() from stsp_ll_mods.py via multiprocessing.Pool.map(). The function selects the appropriate model class (with or without ONR control) based on whether the overnight rate varies in the sample, then runs the two-stage MLE.

  4. Extract and save results. The Kalman smoother provides the daily smoothed state $E[\mu_t \mid \mathbf{y}]$ and measurement disturbance $E[\varepsilon_t \mid \mathbf{y}]$. These are plotted and saved to CSV files in Outputs/.

Analysis 2: Bivariate noise share estimation¶

Notebook: sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb

  1. Load and merge data. Same data loading and merging steps as Analysis 1, with the addition of the related-party SOFIA file for verification.

  2. Handle missing values. Identify dates where any estimation variable is missing. Apply a configurable fill method (default: substitute from sofia_vwap_rba) to ensure complete observation vectors for the Kalman filter.

  3. Plot rate spreads. For each (base, alternative) pair, plot both rates less the cash rate target over time to visualise differences in benchmark construction.

  4. Prepare estimation inputs. For each alternative rate, construct a $T \times 2$ observation matrix (column 0 = base rate, column 1 = alternative) and the overnight rate control vector.

  5. Estimate in parallel. Pass each pair's input list to stspestmp_contemp() from stsp_mods.py via multiprocessing.Pool.map(). The function estimates the bivariate local level model and returns parameters, smoothed disturbances, and a Wald test for equal noise variances.

  6. Compute noise shares. Call noiseshares_contemp() from noiseshares.py to compute $NS_j = \sigma^2_j / (\sigma^2_1 + \sigma^2_2)$ for each pair.

  7. Report results. Display noise shares, Wald test p-values, and full parameter estimates for each alternative rate.

Outputs¶

From sofia_signal_noise_extract_final.ipynb¶

  • Outputs/sign_noise_{rate}_apr2025.csv: Daily signal-noise decomposition for each rate, containing columns for the date, observed rate, smoothed state, state innovation, measurement disturbance, overnight rate, and fitted value.
  • Time series plots of the smoothed state (efficient rate) for each SOFIA variant.
  • MLE estimation summaries printed to the notebook.

From sofia_vwap_rba_vs_vwap_alt_noise_shares_final.ipynb¶

  • Noise share table: proportion of total pricing noise attributable to each rate in each pair.
  • Wald test p-values: statistical significance of the difference in noise variances.
  • Full statsmodels estimation summaries for each bivariate model.
  • Time series plots of rate spreads (rate minus cash rate target) for visual comparison.

Software requirements¶

The code was developed and tested with the following packages. The notebooks print exact version numbers at runtime for reproducibility.

Package Role
Python 3.x Runtime
pandas Data manipulation and merging
numpy Array operations and rounding
statsmodels State-space model estimation (Kalman filter/smoother, MLE)
matplotlib Plotting
multiprocessing Parallel estimation across rates/pairs
openpyxl Reading .xlsx input files (pandas backend)

How to run¶

  1. Place all .py modules (stsp_ll_mods.py, stsp_mods.py, noiseshares.py) in the same directory as the notebooks.
  2. Create a Data/ subdirectory containing the input files listed above.
  3. Create an Outputs/ subdirectory for CSV output.
  4. Run each notebook from top to bottom. The two notebooks are independent and can be executed in any order.

Note on parallelisation: The notebooks use multiprocessing.Pool with cpu_count() - 4 worker processes. On machines with fewer than 5 cores, adjust mp_cores to at least 1.